We address online learning in complex auction settings, such as sponsoredsearch auctions, where the value of the bidder is unknown to her, evolving inan arbitrary manner and observed only if the bidder wins an allocation. Weleverage the structure of the utility of the bidder to provide algorithms withregret rates against the best fixed bid in hindsight, that are exponentiallyfaster in convergence in terms of dependence on the action space, than whatwould have been derived by applying a generic bandit algorithm. Our results areenabled by analyzing a new online learning setting with outcome-based feedback,which generalizes learning with feedback graphs. We provide an online learningalgorithm for this setting, of independent interest, with regret that growsonly logarithmically with the number of actions and linearly only in the numberof potential outcomes (the latter being very small in most auction settings).
展开▼